[proof-of-principle] allow metadata to be fetched from an API #1207
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Nextstrain (Auspice) currently requires all data within the main JSON (or sidecar JSONs such as frequencies). This has a number of benefits, but it makes certain use-cases hard.
One such case I've encountered periodically, but more regularly with COVID, is where metadata which is not integral to the phylogenetic tree (e.g. some epi metadata) is updated, resulting in the Nextstrain dataset being out-of-date. This is currently solved by rerunning augur (often
augur export
is all that's needed if the intermediate files have been stored), however it is often the case that the groups maintaining the metadata are separate from those running the bioinformatics, and thus an extra communication step is required and there's a period where things are out of sync. This seems especially true for epi data, which are often subject to updates & amendments as the situation unfolds.What if we could define this metadata somewhere else, like a google sheet?
Here I've implemented a proof-of-principle to explore the feasibility of storing this metadata outside of the auspice JSON and instead having Auspice fetch it via an API.
The dataset viewable at https://auspice-fetch-metadata-3t0frfv.herokuapp.com/zika-tutorial-metadata-via-api is the zika-tutorial but without the region defined in the tree in the JSON. The coloring metadata looks like:
The actual region metadata is stored at the publicly accessible google sheet https://docs.google.com/spreadsheets/d/10x5-h2_zpjMWoW-m4SAY69KVKdi4AXX0Tw1n-aTjons/edit#gid=0.
When you change the color-by to Region, the data is fetched from that google sheet, and then displayed by Auspice. This decouples the storage of (certain bits of metadata) from the JSONs.
For those with access to the nextstrain google drive, you can modify a value in that sheet, refresh auspice, and see the updated values! (Note: Currently, you have to explicitly change the color-by dropdown to region to get this functionality, but that's easily fixable).
Notes